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li ' Qian, Luscombe and Gerstein [J. Molecular Biol. 313 (2001) 673- 

r^ [ 681] introduced a model of the diversification of protein folds in a 

r~~.. ■ genome that we may formulate as follows. Consider a multitype Yule 

process starting with one individual in which there are no deaths and 

' ' , each individual gives birth to a new individual at rate 1. When a new 

Ph . individual is born, it has the same type as its parent with probabil- 

Qi^ ' ity 1 — r and is a new type, different from all previously observed 



■3 



(N 
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types, with probability r. We refer to individuals with the same type 
as families and provide an approximation to the joint distribution 
of family sizes when the population size reaches A'^. We also show 
that if 1 <C S ^ N^~^ , then the number of families of size at least S 
is approximately CNS~'^^'-'^~'"\ while if N'^~'" < S the distribution 
decays more rapidly than any power. 

'NT ■ 
> . 

^O . 1. Introduction. Genome sequencing of various species has siiown that 

gene and protein-fold family sizes have a power-law distribution. Huynen 
and van Nimwegen [19] studied six bacteria, two Archea and yeast. Li, Gu, 
Q I Wang and Nekrutenko [28] and later Gu, Cavalcanti, Chen, Bouman and 

■^ ■ Li [17] analyzed the genomes of yeast, the nematode C elegans, fruit fly 

^^ ! (Drosophila melanogaster) and human. There have been several models ad- 

,J^ ' vanced to explain this phenomenon. Rzhetsky and Gomez [33] and Karev, 

C^ . Wolf, Rzhetsky, Berezov and Koonin [23] (see also [25]) introduced a birth 

and death model in which, when there are i individuals in a family, a birth 
occurs at rate Aj and a death occurs at rate 5i. They proved, as most readers 
of this journal can easily verify, that if the birth rates are second-ordered 
/\ • balanced, that is, 

S' \i.i/5i = l-a/i + 0{l/i^) 
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2 R. DURRETT AND J. SCHWEINSBERG 

for some a > 0, then the stationary distribution of the family size is asymp- 
toticahy Ci~". See the Appendix of [23] or Example 3.6 on page 297 of [13] 
for more details. 

Qian, Luscombe and Gerstein [32] introduced an alternative model that 
we will study in detail here. Consider a continuous-time Yule process with 
infinitely many types. At time zero, a single individual of type 1 is born. 
No individuals die, and each individual independently gives birth to a new 
individual at rate 1. When a new individual is born, it has the same type 
as its parent with probability 1 — r, where < r < 1. With probability r, 
the new individual has a type which is different from all previously observed 
types. If the kth individual born has a different type from its parent, we say 
that it has type k. Note that, as a consequence of this choice of labeling, 
there are always type-1 individuals, but for k >2, with probability 1 — r 
there are never any individuals of type k. 

In this model one can think of the new types as resulting from mutations, 
where r is the probability of mutation. Alternatively, one could think of a 
Yule process with immigration in which each individual gives birth at rate 
1 — r and new immigrants arrive at rate r times the current population size. 
We refer to individuals with the same type as families. The goal of this 
paper is to study the distribution of the family sizes at the time when the 
population size reaches A''. 

1.1. Approximation to the family-size distribution. Let Ti^ be the time 
that the population size reaches N. Let Rk,N be the number of individuals 
of type k at time T^. Let X^^tv be the fraction of individuals at time T/v 
whose type is in {1,. . . ,k}. Let Vk^N be the fraction of individuals at time 
T/v, among those whose type is in {1, . . . , /c}, that are of type k. This means 
that the fraction of individuals at time T/v that are of type k is Vk^NXk,N 
and the number of individuals of type k at time T/v is Rk,N = ^Vk,N^k,N- 
Note that X^^n = 1 and for k = 1, . . . ,N — 1, we have 

TV 

(1.1) Xk,N= n (i-^i,^)- 

j=k+i 

The following proposition follows from well-known connections between Yule 
processes and Polya urns. We review these connections and prove this propo- 
sition in Section 2. 

Proposition 1.1. For each positive integer k, the limit 

Wk = lim Vk^N 

exists a.s. The random variables Wi,W2, ■ ■ ■ are independent. We have Wi = 
1 a.s. Furthermore, P{Wk > 0) = r for all k>2 and conditional on the event 
that Wk > 0, the distribution ofWk is Beta(l,A; — 1). 
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Let Yj^^N = 1 and, for A; = 1, . . . , A^ — 1, let 

N 

(1.2) n,7v= n (1-^j)- 

j=k+l 

Let A = {(xi)gi :0 < Xj < 1 for all i and J2°ZiXi = 1}. Note that the se- 
quence 

{N^ -Rfc,7v)fcLl = (VA:,ArXfc^7v)fcll, 

whose kth. term is the fraction of the population having type k at time Ti\f , 
is in A. Proposition 1.1 and equations (1.1) and (1.2) suggest that, for large 
N, the distribution of this sequence can be approximated by Qr,N, which 
we define to be the distribution of the sequence in A whose first term is 
Yi,N, whose kth term is WkYk^N for 2 < k < N and whose kth. term is zero 
for k > N. Theorem 1.2 below uses the coupling of the X^^n and Y^^n given 
above to show that the distribution of {N^^Rk^N)'kLi can be approximated 
by Qr,N to within an error of 0(A^~^'^). We prove this result in Section 3. 

Theorem 1.2. We have E[maxi<k<N \Xk,N - ^fc,Af|] < -^■ 

The distributions Qr,N first arose in the work of Durrett and Schweinsberg 
[15] and Schweinsberg and Durrett [34], who studied the effect of beneficial 
mutations on the genealogy of a population. The distributions Qr,N arose 
in that context because, shortly after a beneficial mutation, the number 
of individuals with the beneficial gene behaves like a supercritical branch- 
ing process, which means that the number with descendants surviving a 
long time into the future behaves like a Yule process. In this setting, r is 
the rate of recombination, and individuals descended from a lineage with 
a recombination get traced back to a different ancestor than other individ- 
uals, just as individuals descended from an individual with a mutation in 
the present model are of a different type than the others. Schweinsberg and 
Durrett's [34] approximation had an error of 0((logiV)~^) because of deaths 
and other complexities in the model, but Theorem 1.2 shows that the dis- 
tributions Qr,N give a much more accurate approximation to the family-size 
distribution in the simpler model studied here. We note also that here it is 
assumed that r is fixed, whereas Schweinsberg and Durrett [34] considered 
r tobeO(l/(log7V)). 

1.2. A power law for the number of families of moderate size. Let Fs^n 
denote the number of families at time T/v whose size is at least 5. Define 

(1.3) 5(5) = rr(^)iV5-va-). 
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The theorem below, which is proved in Section 4, shows that if 1 ^ 5 ^ 
N^~'^ , then g{S) provides a good approximation to the number of famihes 
of size at least S, in the sense that \Fs^n — 9{S)\/g{S) — > as A^ — > oo. 

Theorem 1.3. There are constants < Ci,C2 < oo so that 

E[\Fs,N - g{S)\] < Cig{S)[S~'/' + {NS-'/^'-^'^y'^'] + C2. 

Note that S'^/^ and (A^S'-^/^^"''))-^/^ are both smaU and g{S) is large 
when l<^S<^N^-\ 

Theorem 1.3 confirms Qian, Luscombe and Gerstein's [32] power law but 
it also conflicts with their results. Since they considered the number of folds 
that occur exactly V times rather than at least V times, it follows from 
differentiating the right-hand side of (1.3) that for large N we would expect 
a decay with the power b = 1 + 1/(1 — r). This quantity is always larger 
than 2, while they observed powers b between 0.9 and 1.2 for eukaryotes and 
between 1.2 and 1.8 for prokaryotes. Despite this discrepancy, they were able 
to fit their model by starting the process at time zero with A^o > 1 families. 
For example, for Haemophilus influenzae they took r = 0.3, A'^o = 90, and 
ran the process for 1249 generations. For C. elegans they took r = 0.018, 
A'o = 280 and ran for 18,482 generations. 

Figure 1 shows one simulation of the system with r = 0.018, Nq = 1, and 
A^ = 20,000. In contrast to biologists who do a log-log plot of the number 
of gene families of size k (see, e.g., Figure 1 in [18], or Figure 8 in [23]), 
we look at the tail of the distribution and plot the log of the family size 
on the X-axis and the log of the number of families of at least that size 
on the y-axis. The curve fit by Karev et al. [23] has asymptotic power 1.9 
in contrast to the 2.018 that comes from our formula, but note that the 
straight line fitted to our simulation of the distribution function has slope 
0.91. Figure 2 shows the average of 10,000 simulations of the process with 
the C. elegans parameters. The straight line shows that Theorem 1.3 very 
accurately predicts the expected number of families until the log of the 
family size is 4. This simulation also shows that the power law breaks down 
when S ^ N^~^ , which motivates our next topic. 

1.3. Sizes of the largest families. Recall that Rk,N is the number of in- 
dividuals of type k at time T^. Proposition 1.4 below identifies the limiting 
distribution of the size of the large families. 

Proposition 1.4. For each positive integer k, the limit 

Zk = lim N'^-^Rk^j, 

N-*oo 
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exists almost surely. The distribution of Z\ is the Mittag-Lejfler distribution 
with parameter 1 — r, which has density 

(_l)fc+i 



(1.4) gix) 



1 



7r(l 



oo 

■E 

fc=0 



k\ 



■sm{Trak)T{{l-r)k + l)x''~^, x > 0. 



For k>2, conditional on the event that the kth individual born has type 
k, the distribution of Z^ is the same as the distribution of MB^T^ , where 
Bk has the Beta(l,A; — 1) distribution, M has the Mittag-Leffier distribution 
with parameter 1 — r and M and B^ are independent. 

The fact that Zi has the Mittag-Leffler distribution was first proved by 
Angerer [2], who was motivated by the study of bacterial populations. He 
considered a model that is equivalent to our model, except that he referred to 
our type-1 individuals as nonmutant cells, and individuals of all other types 
as mutant cells. Theorem 6.1 in [2] gives the Mittag-Leffier limit when the 
probability of mutation is a fixed constant. See also Theorems 1.7 and 1.8 
of [21], where the Mittag-Leffler distribution arises as a limiting distribution 
in an urn model that is closely related to our model. We mention another 
proof of the Mittag-Leffler limit at the end of Section 1.4, and we prove 
Proposition 1.4 for A; > 2 in Section 5. 

C. elegans parameters, one sim 
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Fig. 1. One simulation of the duplication model with C. 
and iV = 20,000. 
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C. elegans parameters, 10000 simulations 
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Fig. 2. Average o/ 10,000 simulations of the duplication model with C. elegans parame- 
ters, r = 0.018 and N = 20,000. Straight line is the prediction of Theorem 1.3. 



The moments of M are given by ElM"^] = T{m + l)/r(?7i(l — r) + 1) for 
m > (see Section 0.5 of [30]). Also, we have E[B^] = r(m + l)r(A;)/r(m + 
k) for m> 0. Since P{Zk > 0) = r when A; > 2, it follows that for k>2 and 
m > we have 



mz"^ 



T(m + l) 



-r(m(l-r) + l)r(fc) rT{m+l)T{k) 



r(?n(l-r) + l) T{m{l-r)+k) 



T{m{l-r) + k)' 



The next result, which is proved in Section 5, proves what was observed 
in the simulation. The expected number of families of size at least xN^~^ 
decays faster than any power of x. Indeed, it decays faster than exponentially 
in X, and the decay is fastest when r is small. 



Proposition 1.5. There exist constants Ci and C2 such that for all 
x>l, we have 



N 



lim y P(Rk N > xiV^-n < Cie 



-02x^1''' 



1.4. A new Chinese restaurant. Our model has a close relation to a con- 
struction called the "Chinese restaurant process," which was first proposed 
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by Dubins and Pitman. We describe here a two-parameter version of the pro- 
cess, which is discussed in Pitman [29, 30]. Suppose < a < 1 and > —a. 
Consider a restaurant with infinitely many tables, each with an unbounded 
number of seats. The first customer sits at table 1. Suppose, for some n > 1, 
that after n customers have been seated, there are k occupied tables, with Ui 
customers at the ith table, so that ni H \-ni. = n. Then, the (n + l)st cus- 
tomer sits at table i with probability (n^ — a)/{n + 9) and sits at an unoccu- 
pied table, which we call the (A; + l)st table, with probability {6 + ka)/{n + 9). 

For any N, the Chinese restaurant process gives rise to a random parti- 
tion n^v of {1, ... , N}, where i and j are in the same block of Hjy if and only 
if the ith and jth customers are seated at the same table. That is, the parti- 
tion n^v consists of blocks Bij\f, . . . , B^^j^, where Bjj^] consists of all integers 
i between 1 and N such that the ith customer is seated at the jth table. Let 
|-Bj^Ar| denote the number of the first A^ customers at the jth table. Then (see 
[30]), the distribution of the A-valued sequence {N~^\Bi^]^\, N~^\B2^n\, ■ ■ ■) 
converges as A^ ^ oo to the Poisson-Dirichlet distribution with parameters 
(a,^). This distribution is defined as follows. Let (Dj)^^ be a sequence of 
independent random variables such that Dj has the Beta(l — a, 9 + ja) dis- 
tribution. Then the sequence whose kth term is Di.Yl-~'^^{l — Dj) has the 
Poisson-Dirichlet distribution with parameters (a, 9). The Poisson-Dirichlet 
distributions were studied extensively by Pitman and Yor [31]. See also [30] 
and [4] for further applications of these distributions. 

An important special case of the Chinese restaurant process is when a = 0. 
Then, we may assume that the (n + l)st customer sits at a new table with 
probability 9/{n + 9) and otherwise chooses one of the previous n customers 
at random and sits at that person's table. In this case, if vr is a partition of 
{!,..., A^} with k blocks of sizes ni, . . . , n^, one can check that 

<'-'^' P'""-"' = (l + .)(2 + r-'(A^-l+^) .4"""'"- 

This leads to the famous Ewens sampling formula [16]. The Ewens sam- 
pling formula describes the family-size distribution in a Yule process with 
immigration when immigration occurs at constant rate 9. When there are 
n individuals in the Yule process, they are each splitting at rate 1 and 
immigration occurs at rate 9, so the probability that the (n + l)st individ- 
ual starts a new family is 9/{n + 9). For another application of the Ewens 
sampling formula, consider a population in which each lineage experiences 
mutation at rate 9/2 and whose ancestral structure is given by Kingman's 
coalescent (see [24]), meaning that each pair of lineages merges at rate 1. 
Working backward in time, when there are n + 1 lineages, coalescence occurs 
at rate n{n + l)/2 while mutations occur at rate 9(n + l)/2. Consequently, 
the probability of having mutation before coalescence is 9/{n + 9). Because 
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Kingman's coalescent is a good approximation to the genealogy in popula- 
tions of fixed size, the Ewens sampling formula is a standard model for gene 
frequencies in populations of fixed size. However, this model does not lead 
to the power-law behavior that has been observed in some data. 

Note that our model can be viewed as a variation of the Chinese restaurant 
process in which the {n + l)st customer sits at a new table with constant 
probability r, rather than with probability 9/{n + 9), and otherwise picks 
one of the previous n customers at random and sits at that person's table. 
One can define the random partition Bat of {!,..., A^} such that i and j 
are in the same block if and only if the ith and jth customers are seated 
at the same table. In our branching process interpretation, this means that 
the ith and jth individuals born have the same type, so the family sizes in 
our model correspond to block sizes of Q^. It is straightforward to derive 
an analog of the Ewens sampling formula in this case. If vr is a partition of 
{1, . . . , N} into k blocks of sizes ni, . . . , n^, and if ai < 02 < • • • < Ofc are the 
first integers in these blocks, then 



r { i — r 



k-l(l _ ^\N~k 



{N-iy. 



k 

Y[in^-ly■ 



i=l J j=2 



k 



This formula depends on 02, . . . , a^ as well as the block sizes ni, . . . , n^, so the 
random partition Gat is not exchangeable. Nevertheless, one can still look 
for approximations to the distribution of the block sizes. We see from The- 
orem 1.2 that the distributions Qr^N play the role of the Poisson-Dirichlet 
distributions in this model. Because the population size in Yule processes 
grows exponentially, these distributions provide a plausible model of gene 
frequencies in growing populations, and they do lead to power-law behavior, 
as shown in Theorem 1.3. Furthermore, the approximation error in Theo- 
rem 1.2 of 0(A^~^ ") is the same order of magnitude as the error when the 
distributions of the block sizes of the partitions IIjv above are approximated 
by the Poisson-Dirichlet distributions. 

Finally, the Chinese restaurant process when a = \ — r and 9 = Q can be 
used to give another proof of Proposition 1.4 when k=l. This argument was 
pointed out to us by Wolfgang Angerer, Anton Wakolbinger and a referee, 
and also appears implicitly in earlier unpublished notes of Jim Pitman. Given 
our multitype Yule process, we can obtain a Chinese restaurant process with 
a = 1 — r and ^ = by saying that each individual born in the Yule process 
sits at the same table as its parent, unless it has type 1 in which case it 
starts a new table. Thus, the number of type-1 individuals is the number of 
occupied tables, so the Mittag-Leffler limit follows from Theorem 31 of [30]. 
See also Angerer and Wakolbinger [3]. 
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1.5. Connections with preferential attachment. In our model, gene fami- 
lies grow at a rate proportional to their size. This is similar to the behavior of 
Barbasi and Albert's [8] preferential attachment model in which one grows a 
graph by adding a vertex at each time and connecting that vertex to m exist- 
ing vertices chosen with probabilities proportional to their degrees. Through 
simulations and heuristic arguments, Barbasi and Albert concluded that the 
fraction of vertices of degree k converged to a limit pk ~ Ck~^. This result 
was later proved rigorously by Bollobas, Riordan, Spencer and Tusnady [11]. 

Fueled by the observation of power laws for degree distributions in the In- 
ternet, collaboration networks and even sexual relations in Sweden, this work 
touched off a flurry of activity. To remedy the difficulty that the power was 
always 3 in the Barbasi-Albert model, Krapivsky, Redner and Leyvraz [26] 
introduced a model in which attachment to vertices of degree i was pro- 
portional to a + hi, and were able to achieve any power in (2,c«). These 
results, published in Physical Review Letters, omit a few details, but work 
by Kumar, Raghavan, Rajagopalan, Sivakumar, Tompkins and Upfal [27] 
and Cooper and Freize [12] further generalizes the model and provides rig- 
orous proofs of the power laws. 

The preferential attachment models are different from ours because adding 
an edge changes the degree of two vertices. However, if one considers directed 
graphs and analyzes only the out-degree, then taking a = 1 — r and (5 = in 
the Cooper-Frieze model gives a model identical to ours and a power law 
that is proved in their Section 6.1. Later work of Bollobas, Borgs, Chayes 
and Riordan [10] investigates a directed graph model which contains our 
result as a special case and for which they derive a power law. We point 
out also that a construction similar to that given in Section 1.1 was devel- 
oped by Berger, Borgs, Chayes and Saberi [9] in the context of preferential 
attachment graphs. 

In addition to recent work, Simon [35] considered the following model 
of word usage in books, which he also applied to scientific publications, 
city sizes and income distribution. Let Xi{t) be the number of words that 
have appeared exactly i times in the first t words. He assumed that (a) the 
probability that the (t + l)st word is a word that has already appeared i 
times is proportional to iXi{t)] (b) there is a constant probability a that 
the {t + l)st word is a word that has not appeared in the first t words. This 
of course is exactly our model, but even this is not the earliest reference. 
It appeared in work of Yule [37] who considered a model of the number of 
species in a given genus. Both Yule [37] and Simon [35] argued that the 
model gives rise to power-law behavior. See Aldous [1] for a more recent 
account and a simple explanation for the power law. 

While our model has been considered a number of times, our results are 
more precise. In most cases investigators have considered the limit of the 
fraction of vertices of degree k for fixed k. Exceptions are Bollobas, Riordan, 
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Spencer and Tusnady [11] who were able to prove results for k < N^'^^ and 
Cooper and Freize [12] who could handle k < N^''^^. In contrast, our results 
hold for the entire range over which the power law is valid and show how 
the power law breaks down for larger values. 

2. Branching processes and Polya urns. In this section we review some 
well-known connections between Polya urns and continuous-time branching 
processes, which will be useful later in the paper. Athreya and Karlin [6] 
showed how to embed the urn process in a continuous-time branching pro- 
cess. This technique was reviewed in [7]. See [20] for a thorough survey of 
recent developments and generalizations. 

Recall the following version of Polya's urn model. Suppose we start with 
a white balls and b black balls in the urn. We then draw a ball at random 
from the urn. If the ball we draw is white, we return it to the urn and add 
an additional white ball to the urn. If the ball we draw is black, we return 
it to the urn and add another black ball. This process can be repeated 
indefinitely. To see the connection with branching processes, consider a two- 
type branching process in which there are no deaths and each individual 
gives birth at rate 1. If at some time there are a individuals of type 1 and b 
individuals of type 2, then the probability that the next individual born will 
have type 1 is a/ (a + 6), which is the same as the probability that the next 
ball added to an urn with a white balls and b black balls will be white. It 
follows that the distribution of the number of type-1 individuals when the 
population size reaches A^ is the same as the distribution of the number of 
white balls in the urn when the number of balls in the urn is N. 

Let ^j = 1 if the ith ball added to the urn is white, and let ^j = if the ith 
ball added to the urn is black. Fix a positive integer N. Let S C {1, . . . , A^}, 
and let S'^ = {1, . . . , A^} \ S. Let |5| denote the cardinality of S". It is easy to 
check that for a,b> 1, 

P{Ci = 1 for i G S and Ci = for i G 5") 
(2.1) 

_ (a + |g|-l)!(fe + Ar-|5|-l)!(a + b-l)! 

(a-l)!(6-l)!(a + 6 + A^-l)! 

Since the right-hand side of (2.1) depends only on |5| and not on the par- 
ticular elements of S, the sequence (Ci)i^i is exchangeable. By de Finetti's 
theorem, there exists a probability measure /i on [0, 1] such that for all A^ 
and all S C {1, . . . , A^}, we have 

(2.2) P{Q = 1 for ? G S and Ci = for i £ S"") = / 2;l^l(l - x)^-l^l/i(dx), 

Jo 

where /u is the distribution of liniTv^oo A^~^|{^ ^ N:(^i = l}\, the limiting 
fraction of white balls in the urn when we start with a white balls and b 
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black balls. It follows from Theorem 1 in Section 9.1 of Chapter V of [7] that 
fj, is the Beta(a, b) distribution. One can also see this by checking that the 
right-hand sides of (2.1) and (2.2) agree in this case. 

Proof of Proposition 1.1. Clearly Wi = l a.s. because Vi^n = 1 a.s. 
Assume now that k >2. Let Sk be the set of all i such that the type of 
the ith individual born is in {1, . . . , A:}. Let 7ik be the u-field generated by 
the sets Sk, Sk+i, .... Note that if j > k, then Vj^n is Wfc-measurable for all 
N. Therefore, to prove the proposition, it suffices to show that, for all k>2, 
the limit W^ exists a.s. and satisfies the following conditions: 

1. P{Wk>0)=r. 

2. The conditional distribution of Wk given Wk > is Beta(l, k — 1). 

3. Wk is independent oiTik- 

Note that the third condition implies that Wk is independent of (Wj)^^^^. 

Enumerate the elements of Sk as ii < i2 < i^ < ■ ■ ■ ■ Define a sequence 

(C jJLi such that C = 1 if the ijth individual has type k and Q = 

otherwise. Note that ij = j for j < k. Also, C = for j = 1,. . . ,k — 1. 
Recall from our conventions for labeling the types that if the kth individual 

(k) 

to enter the population has a new type, then it has type k. Therefore, Q = 1 
if and only if the kth. individual has a new type, and whether or not this 
individual has a new type does not affect the births of individuals of types 
greater than k. Thus, P(cf ^ = l|Wfc) = r. If cf ^ = 0, then clearly Wk = 0. 
Because of the connection between branching processes and Polya urns, if 
Q = 1, then the sequence (0+^)^1 has the same distribution as the Polya 
urn sequence (Cj)i^i defined above when a = 1 and h = k — 1. Furthermore, 
the values of C- do not affect the births of individuals of types greater 
than k, so this relationship holds even after conditioning on 7ik- It follows 
that, conditional on Q = 1, the random variable Wk has a Beta(l, /c — 1) 
distribution and Wk is independent of 7ik ■ □ 

Now, fix N and to simplify notation, write Xk, Yk, Vk and Rk for Xk^N, 
Yk,N, yk,N and Rk,N, respectively. We will use this notation throughout the 
rest of the paper when the value of N is clear from the context. Let J^k be 
the (T-field generated by the random variables Vj and Wj for j >k + 1. It 
follows from (1.1) and (1.2) that Xk and Yk are .Ffc-measurable. Let Qk be 
the cr-field generated by the random variables Vj for j > k + 1 and Wj for 
j>k. 

We can write Wk = ^kWk, where S^k has a Bernoulli(r) distribution and is 
independent of J-k, and Wk has a Beta(l, k — 1) distribution and is indepen- 
dent of Ck and J^k- Since E[Wk] = l/k and E[W^] = 2/[k{k + 1)], we have 
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E[Wk\J^k] = r/k and E[Wl\rk] = 2r/[k{k + 1)]. Note that Vfc = whenever 
Wk = 0. On {Wk > 0}, define Vk = Rk - 1- Define Vfc = on {Wk = 0}. Then 

1 + Vfe- 



Vk 



(2.3) 



NXk 






k 

" + 



Vk 



{NXk-k 



{l^fc>0}- 



\NXk) \NXk-kjy NXk 
It follows from (2.2) that 

the conditional distribution of Vk given Qk is Binomial (A^X/t — k, Wk) 
(2.4) 

because there are NXk ~ k individuals, after the first k, with types in 
{l,...,k}, and conditional on Qk^ each has type k with probability Wk- 
Therefore, 



(2.5) 
and 



E[Vk\gk] 



1 



[NXk 



+ Wk 



NXk-k 

NXk 



{Wk>0} 



EiVklJ'k] = E[E[Vk\gk]\J'k] = r/k. 



3. Approximating the family-size distribution. In this section we prove 
Theorem 1.2, which implies that the distribution Qr,N is a good approxima- 
tion to the family-size distribution in the Yule process with infinitely many 
types. To prove this result, we need to show that the Xk, which are related 
to the Vj by (1.1), are close to the Yk, which are likewise related to the Wj 
by (1.2). We begin by showing that E[Xk] and -E'[yfc] are the same. 

Lemma 3.1. We have E[Xk] = E[Yk] = IljLfc+ill -j)forl<k<N. 

Proof. We prove the formula for i?[yfc] by backward induction on k. 
Clearly -©[Yat] = 1. Suppose the formula holds for some k>2. Then 

E[Yk^i] = E[{1 - Wk)Yk] = E[{1 - Wk)]E[Yk] 

-^)i.('-^n(. 

To get the same formula for ^[X^], first note that E[Xkj] = 1 for 1 < j < A;. If 
n>k, then conditional on X^ „, the probability that the (n-|- l)st individual 
has a type in {1, ... , k} is (1 — r)Xk^n- Therefore, 

nE[Xk,n] + (1 - r)E[Xk,n] 



E[X, 



k,n+l\ 



1 



so the formula for -E[Xfc] follows by induction on n. D 



E[Xk^.. 
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Lemma 3.2. We have {^fe-''^/^ < E[Xk] < {^fe''/^ forl<k<N. 

Proof. By Lemma 3.1, we have logE[Xk] = J2jLk+i iog(l - r/j). Note 
that if < X < 1, then log(l — x) = —J2T=i{^^/^)- Summing, we see that if 
< X < 1/2, -{x + x^) < log(l -x)< -X. Therefore, 

N N 

j=fc+i •> j=k •' 



(3.1) 



< 



k 



N J. J. / ^ 

. x'^" = ^ + ^°<iV 



N 



(3.2) 



\ogE[X,]>-Y. {"- + i 
j=k+i ^•' •' 



> 



AT 



k X 



N J.2 



— dx — i —^dx> log -T — — . 



k x^ 



]^\r j,2 



N 



The result now follows by exponentiating both sides in (3.1) and (3.2). □ 

Lemma 3.3. We have E[Xl{Wk - Vkf] < r{^ + ^,+,\,i_J for2<k< 

N. 



Proof. By (2.3), we have 
Vk-Wk 



(3.3) 



(I 


\ f k \ 


+r- 


.3 .-w. 



NXk - k 



{W'fe>0}- 



\NXk-k "y\ NXk 
When we take the conditional expectation given Qk of the square of the 
right-hand side of (3.3), the cross-term vanishes because (2.4) implies 

Vk 



E 



Wk 



Qk 



0. 



_NXk-k 
Since Xk and Wk are ^fc -measurable, using (2.4) again gives 

E[{Vk - Wkf\gk] 



E 



+ 



Vk 



NXk - k 



k 



NX, 



1 



{Wk>0} 



wj(^^\\ 



V NXk 



{Wk>0} 



Wk 



NX, 



1 



{Wk>0} 



+ 



Wk{l-Wk) f NXk-k 
NXk-k \ NXk 



1 



{Wk>Q}- 
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Since Wk is independent of J^k and the conditional distribution of Wk given 
Wk>0 isBeta(l,A;-l), 

NXk-kfr 



E[{Vu-WkY\TL] 



k-l I k 
k'^{k + l)\NXk 



r + 



2r 



{NXkY \k k{k + l) 



<TTT7T7o + 



N'^Xl NkXk ■ 
Thus, using Lemma 3.2, we get 

E[xUWk - Vk)^] = E[XiE[{Wk - VkflJ'k]] < E 

1 



r 



iV2 ATl+r/jl-r 

since for k>2, we have e'^^'' < e^/^ < 2. D 



r/k ^ 



r rXk 

Ir^^lTk 

2 

+ 



N^ 7Vl+r^l-r /' 



Lemma 3.4. For every real number a, we have E[Xk{Xk — Yk){Wk — 
Vk)iWk-a)]=0. 

Proof. Using the fact that E[Wk - Vk\J^k] =0 and that Wk is Gk- 
measurable, we have 

E[{Wk - Vk){Wk - a)\J^k] = E[Wk{Wk - Vk)\J'k] 

= E[E[Wk{Wk-Vk)\gk]\Tk] 

= E[Wi-WkE[Vk\Gk]\J'k]- 

Using (2.5) now, the above equals 



E 



Wi - Wk 



1 



NXi 



+ Wk 



NXk-k 

NXk 



{Wk>0} 



:Fk 



NXk 
r f 2 



{kE[Wl\Tk]-E[Wk\Tk]) 



1 



NXk\k + l k 



It follows that 



E[Xk{Xk - Yk){Wk - Vk){Wk - a)] 

= E[E\Xk{Xk - Yk){Wk - Vk){Wk - a)\Tk]] 
= E[Xk{Xk - Yk)E[{Wk - Vk){Wk - a)\Tk\] 



E 



{Xk-Yk){- 



1 



MJ\k + l k 
where the last equality follows from Lemma 3.1. D 



0, 
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Lemma 3.5. We have E[{Xk - Yfe)^] < 3/iV forl<k<N. 

Proof. Suppose 2 < A; < A^ . We will bound E[{Xk-i - Ifc-i)^] in terms 
of E[{Xk - Ykf]. First, note that it follows from (1.1) and (1.2) that 

Xfc_i - Yk^i = (1 - Vk)Xk - (1 - Wk)Yk 
(3.4) 

= XkiWk - Vk) + (Xk - Yk){l - Wk). 

Thus, 

E[{Xk-i - n_i)2] = E[Xl{Wk - Vkf] + E[{Xk - Yk)\l - Wkf] 
(3.5) 

+ 2E[Xk{Xk - Yk){Wk - yfc)(l - Wk)]. 

By Lemma 3.4 with a = 1, the third term on the right-hand side of (3.5) 
vanishes. Using Lemma 3.3 and the fact that E[{Xk — Yk)'^{l — Wk)"^] < 
E[{Xk-Ykf],weget 

E[{Xk., - n_i)'] < E[{Xk - n)'] +r(^ + ^J^^,„, ). 
Since X^r = Yj\f = 1, it follows that for 1 <k < N , we have 

E[{Xk-Ykr]<j:r[-^ + ^,_,,.^,^^ )<^ + j^Y.-TZ^ 

(3.6) 

1 2r /"^ 1 , 1 2r /'N''\ 3 



- AT + ATl+r J^ xl^r ^^ ^ ^y + Afl+r \r J N' 
which completes the proof. D 

Proof of Theorem 1.2. Let M = maxi<fc<jv \Xk — Yk\. Fix a; > 0. Let 
T = max{A; : \Xk - Ifcl > x] if M > x, and let T~=h otherwise. For 2<k<N, 
define 

(3.7) pk = Xk{Wk - Vk) - {Xk - Yk){Wk - r/k) 
so that by (3.4), Xk-i - n_i = Pk + {Xk - Yk){l - r/k). Let 

(Xk-Yk, ioik>T, 

^'' = {xt-Yt+ J2 Pj^ iovk<T. 

[ j=k+l 

This definition is chosen so that 

(3.8) Hk^i -Hk = Pk- {r/k)Hkl{k>T}- 
Our first step is to show 

(3.9) P{M>x)<x-^E[Hl]. 
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To establish (3.9), we mimic the proof of Kolmogorov's maximal inequality 
in [13]. Let A^ = {T = k}, so the event that M > x is the event Ufc=i ^fc- 
Then 

N 

fe=i 

N 



= Y, E[{Hl + 2Hk{Hi - Hk) + {Hi - Hkf)lA,] 
fc=i 

N N 

> ^ E[h11aJ + 2 J2 E[HkiHi - Hk)lA,]. 



k=l fc=l 

If J < A;, then 

E[p,\Tk]=E[E[p,\Tj]\J^,] 
(3.10) 

= E[XjE[Wj - V^\T,\ - {X, - Y,)E[W, - r/j\:F,]\:Fk] = 0. 

Therefore, 

N N 

J2 E[Hk{Hi - Hk)lA,] = J2 E[E[Hkip2 + ■■■ + Pk)lAjm 

k=l k=l 

N 

= Y, E[HktA,E[p2 + ■■■ + Pk\rk]] = 0. 
fc=i 

It follows that 

N N 

E[Hl] > Y E[HllA,] > E ^'Pi^k) = x^P{M > x), 

k=l fc=l 

which implies (3.9). 

We now obtain a bound on E[Hi]. Using (3.8) and the fact that the 
random variable H^ and the event {k > T} are ^^-measurable, we have 

E[Hi_,\:Fk]=E[ipk + Hk{l-{r/k)l{,^r})f\:Fk] 

= E[pl\J^k] + 2Hk{l - {r/k)l^,^T})E[pk\:Fk] 

+ Hl{l-[r/k)l^k^T}f. 

Since E[pk\J^k] = by (3.10), it follows that E[Hl_^\rk] < E[pl\J^k] + H^, 
and thus E[Hl_-^] < E[p1] + E[Hl]. Since H^ = 0, we can combine this result 
with (3.9) to get 

N 

P{M >x)< x~'^E[Hl] < x~2 Y E[pl] ■ 

k=2 
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To bound i?[p|] we recall the definition in (3.7) and use Lemma 3.5 and the 
fact that Wk is independent of X^ and Y^ to get 



E[{Xk - Yk)\Wk - r/kf] < E[{Xk - Yk)^]E 

3 / 2r 



'2,T T 

^'"T^^^ + F 



2r r 



6r 



N\k{k + l) A;2 k"^ J - Nk{k + 1)' 
Combining this result with Lemmas 3.3 and 3.4 with a = r/k, we get 



(3.11) 



E[pi]<T[— + 



1 



+ 



6 



JV2 m+rk^-r Nk{k + l)J' 



The telescoping sum X]fc=2 6^/[-^^(fc + 1)] !^ Sr/N, so it follows from (3.6) 
and (3.11) that 



PiM>x)<x-'J:E[pl]<x-'(^ + %)< ' 

fc=2 



N N 



Nx"^' 



Thus, 

roc 

E[M] = / P(M >x)dx< 
Jo 

which proves the theorem. D 



2 r°° 6 , 

Vn 72/v^ Nx^ 


2 3 


5 



4. The power law. In this section we prove Theorem 1.3, which gives the 
power law for the family-size distribution. Our first lemma gives a bound 
on the moments of the binomial distribution. Throughout this section, we 
allow the value of the constant C to change from line to line. 

Lemma 4.1. Fix m'>\. There exists a constant C such that for all n 
and p such that np> 1, if X has a Binomial(n,p) distribution, then 

m/2 



E 



X 



n 



■P 



<C 



Proof. For now, we assume that p < 1/2. The proof is based on two 
bounds for binomial tail probabilities. If z > 0, then 

(4.1) p(^-p<-z]<e-''''/^P, 



and ii < z <1 — p, then 
(4.2) 



p(^-^^ A ^ ,-nzy2{p+z) 



V n 



■ p> z ] < e 
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Equation (4.1) follows from (3.52) on page 121 of [22]. To prove (4.2), we 
use the fact that if p < a < 1, then P{X/n > a) < e~"^("), where 

H{a) = alogia/p) + (1 - a)log((l - a)/(l -p)). 

This is proved, for example, in [5]. We have H'{a) = log{a/p) — log((l — 
a) /{I - p)) and H"{a) = l/[a(l - a)]. Since H{p) = H'{p) = 0, by Taylor's 
theorem there exists z G [p, a] such that H{a) = ^H"{z){a — p)'^ ■ Note that 
the function a i— > H"{a) is decreasing on (0, 1/2) and increasing on (1/2, 1). 
Therefore, if a < 1/2, then H{a) > \H"{a){a-pf > j^{l - pf and if a > 
1/2, then iJ(a)> ^H"{l/2){a-pf >2{a- pf > ^{a- pf. Equation (4.2) 
follows by substituting z = a— p. 

Now, using Lemma 5.7 in Chapter 1 of [13], we get 



E 



(4.3) 



X 

P 

n 



mz"'-^P 



X 

P 

n 



> z] dz 



+ 



i-p 



mz 



m— 1 



P 



X 

n 



> z dz. 



Using (4.1) and (4.2), then z <p, and making the substitution z = y^jAp/n, 
the first term on the right-hand side is less than or equal to 



mz 



-l(g-n^V2p + g-n^V2(p+^))^^ 



<2?n 



(4.4) 



z'^-^e 



-nzVAp^^ 







.oo /4 xmy2 



""^^e-^" dy 



<CiP 



n 



■m/2 



Likewise, using z/[p + z) > 1/2 for z > p and substituting z = Ay/n, the 
second term on the right-hand side in (4.3) is less than or equal to 



i-P 



1-p 



(4.5) 



< m 



< m 



m-lg-n^/4^^ 



n 



,m— 1, 



'dy 



< 



C 



n" 
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It follows from (4.3), (4.4) and (4.5) that if p< 1/2 and np> 1, then 

m/2 



1 \ ™ / T) \ ™/2 

<c - 



(4.6) E --P <C(^] +C{- 

n \ \n/ \n/ \n, 

The fact that np>l was used only for the second inequality in (4.6). There- 
fore, if p > 1/2 and np>l, we can use the first inequality in (4.6) to get 

n-X 



E 



X 

P 

n 



E 



n 



(1-p) 



<c{^^] +c( 

n J \n, 



<C 



n 



m/2 



which completes the proof of the lemma. D 

The next lemma bounds the moments of X/^.. Recall that X^ = X^^j^ is 
the fraction of the first N individuals with one of the first k types. 

Lemma 4.2. Fix a real number m > 1. Then there is a positive constant 
C such that for all k>l, 



Proof. Let M^^i = Y^j^iRj^i be the number of individuals at time T; 
with types in {1, . . . ,k}. Note that M^^^q = NX^. Conditional on Mf^^i, the 
probability that the {I + l)st individual born has a type in {1,...,A;} is 
(1 - r)Mk,i/l. Therefore, 



JMk,i 



Cz] 



^[m,vi^m] = A^M + (1 - [-^) mk,i + ir - Ml 

Since 6™ — a™ = /^ mx"^~^ dx < mb^^~^{h — a) for < a < 6, the above is less 
than or equal to 

MTi + (1 - (^\m{M,^i + 1)— 1 



\- 



(l-r)m^ , {^-r)m^ 



^ M,™, + ^— ^M,™, + ^—^[(M,,, + 1)-"^ - ..,^, 



M^"^ 



]Mu,i. 



Using the integration inequality again this is less than or equal to 
MZ 



1 + ^ r^- 1 + ^ r^[{m - l){Mk,i + l)'"-']Mfc,, 



I 



I 



^, (1 — r)m\ m(m — 1) , ,^ 1 



'fcj 



/ 



/ 
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Since M^^i > 1, we have 

(4.7) i?[M^,+i|M,,,] < A4™, (l + iiz^^ + ^M]r-\ 

We now establish the lemma for integer values of m by induction. When m = 
1, the result is an immediate consequence of Lemma 3.2 and the inequality 
gr/fc < 1 _|_ C/k. Suppose the result holds for ?tt, — 1. Then, since M^^; = IX^^i, 
we have 

m— It 



EM 



m— li im—1 



r-'E 



M, 



k,l 



I 



1 /fc\('"~i)''/ c 



< (7/^(™-i)''/("''~i)(i-^') 
Therefore, taking expectations of both sides in (4.7), we get 

ElM"^ 1 ^ ( 1 I ^" ' '"" \ T?\l\J^m^ I rfu[m-l)rnm-l){l-r)-l 



<(l + ^L^^\e[M^i] + Ck^m-l)ri{ra-l)(l-r)- 



1 



(1 — r)m 



Since Mk^k = k, iterating the last result shows that £'[X™] = £'[Mfc^jv]/iV'' 
is at most 

N-l , 

k"'Y[(l + 

- j=k ^ 

N-l / N-l , 

^Y^Ck'^ra-l)r^[m-m-r)-l\ J| h + 

l=k Vj=«+1 

Since 1 + x < e^ for x > 0, we have 



(1 — r^m 
J 



j=k 



^J=fc 



3 



< exp f (1 — r)m ( t + / ^ 
'(1 — r)m 



dx 



■ exp 



+ (l-r)mlog( — 



Thus, 



E[xr] < 



m 






*='"|t + 1 
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+ c y^ ^(™-i)'';('TJ-i)(i 



!_,)_! /ivyi-)- 



l=k 



I 



u\mr 

N 



C 



N-l 



1 + T + ^^"''E^' 



2+r 



k 



l=k 





/k\"''- 


C 


< 


T-r) 


1 + ^ 




\N 


k 



The result for integer values of m follows by induction. 

Now suppose n < m < n + 1, where n is a positive integer. Let 

p={n — m + \)~ 

and let 

q = {m — n)~ . 

Note that p~^ + q~^ = 1 and n/p + (n + l)/q = m. By Holder's inequality, 

so the lemma is true for all real numbers m > 1. D 

To prove Theorem 1.3, we will approximate the family sizes NVkX^ by 
NWk{k/NY . To use this approximation, we will need a bound on the prob- 
ability that the difference between these two quantities is large. Note that 



(4.8) 



VkXk - Wk 



Xk{Vk-Wk) + Wk[Xk 



The next two lemmas deal separately with the two terms on the right-hand 
side of (4.8). 



Lemma 4.3. There is a positive constant C so that for all S > 

N 



k=l 



Wk[Xk 



5S\ /N'-r-^2/(3^2r) 



Proof. Conditioning on J^^ and noting that Wk is independent of J^^ 
gives 



E 



wi(Xk 



r\ 2n 



E[Wi]E 



Xk 



k_^ -^ 2 

N 
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If A; > 2, Lemmas 3.2 and 4.2 give that the above is equal to 



2r f 


E[Xl] - 2E[Xk] 


fky 


yNJ ) 




k{k + l)\ 




2r 

- k^ 


\NJ [ ^ k, 


1 {nJ 


2r 


fkyn 


2r f ky^' 

-¥\n) 


H- 


/ ^2 


\ 

+ 1 
/ 






/ c 













- Ar2r^3-2r ' 

Fix a positive integer L. Using a trivial inequality for k <L and Chebyshev's 
inequality, 

N ^ / rox -2 



fc=i 



VFfe Ufc 



N 



2N 



C 



6S 



k=L+l 



(4.9) 



<i + 



L + 



CN' 



2-2r ^oo 



(<5S)2 



1 



L 2; 



3-2r 



dx 



CiV2-2r^-{2~2r) 



(2-2r)((55)2 ■ 
If L = (7Vi-7(5S'))2/(3-2r)^ then the right-hand side of (4.9) is bounded by 

^l-rx 2/(3-2r) /^1-r x 2-(2-2r)2/{3-2r) /^1-r x 2/(3-2r) 

+ Ci^T^] <C[ 



6S 
as claimed. D 



6S 



5S 



Lemma 4.4. There is a constant C so that for all 6 > 0, we have 
(4.10) E ^(l-'^'=(»'* - »yi > ^) S 1 + J^^y^M^- 



Proof. Recall from Section 2 that Wk = CkWk, where S,k = '^{Wk>o} 
has a Bernoulli(r) distribution and Wk has a Beta(l,A; — 1) distribution 
and is independent of ^k- Also recall that Vk = {NVkX^ — l)]l{vi/^>o} is a 
random variable such that the conditional distribution of Vk given Qk is 
Binomial(A^Xfc — k,Wk)- Using (2.3), we see that for all fc > 2 we have 

p(\Xk{Vk-Wk)\> 



2N 



6S\ 



■■Pi\NXk{Vk-Wk)\t{w,>o} > y) 
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(4.11) 



P 1(1 - kWk) + {Vk - Wk{NXk - mHw,>G} > 



ss 



<P(\l-kWk\>^-^]+P(\Vk-Wk{NXk 



Ml ^S\ 



Let m = 3/2(1 — r). The reason for this choice wih become clear in (4.15). 
Until then the reader should keep in mind that ttz is a fixed real number. 
Since T{x + 1) = xr(x) for all real x, we have T{k)/T{m + k) < Ck~"^ for 
some constant C. Therefore, 



(4.12) 



E[wn 



r(A;)r(m+l) 



<cr 



r(m + k) 
so using (a + b)"" < 2™(o'" + b"') for a,b>0, we have 

E[il + kWkr] < 2™(1 + E[ikWkr]) < C. 
Therefore, by Markov's inequality, if A; > 2, then 



Pi |1 - kWk\ >^-l)<p(\l + kWk\>^4- 



(4.13) 



< 



< 



c 



E[il + kWkr] 



{5sy 



which bounds the first term on the right-hand side of (4.11). 

Because of the restriction np > 1 in Lemma 4.1, we must split the second 
term in (4.11) into two pieces, depending on the value of WkiNXi^ — k). 
Let V^ be a random variable such that, conditional on Q^, the distribution 
of Vl is Binomial(A^Xfc - k, l/{NXk - k)). We set Vl = if NXu -k = 0. 
Note that when Wk{NXk — k) <1, the conditional distribution of Vj^ given 
Qic stochastically dominates the conditional distribution of V^ given Qi^.. By 
Lemma 4.1, E[\Vl. - iri^fc] < C. Note also that |Vfc - WkiNXk -k)\=Oon 
the event {NXk -k = 0}. Therefore, if fc > 2, then 

p(\Vk- WkiNX, - A:)|l|,^,.(^^^_,)<i} > ^^^ 

= E\^E\^p(\Vk-W,{NX,-k)\l^^^^^^^_,^^^^ 
(4.14) <E E P(^\V^-1\ + 1> ^) \gk 
<(— j E[E[{\Vi-l\ + ir\G,] 



> 



5S 



Qk 
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By Lemma 4.1, we get, for k>2, 

P(\V,- W,{NX, - /c)|l{H/,(;vx,-.)>i} 



6S\ 



E 



<E 



<E 



E 



P 



NXk-k 
5S 



Wk 



^{WkiNXk-k)>l} > 



4{NXk - k) 



E 



Vu 



(1(^^, c 



NXk-k 



Wk 



ss \ 


Gk] 


A{NXk-k)) 


{Wk{NXk~k)>l} 


Qk^ 



NX, - k 



< 



C 



j-m/2 



{5sy 



ElW^^^'^NXk-k) 



m/2i 



By conditioning on Tk and noting that Wk is independent of this a-field, 
we see that this is at most 



C 



-E[WZ''^]E[{NXkT'^] < 






(4.15) 



by (4.12) and Lemma 4.2. Recahing that m = 3/2(1 — r), the above is at 
most 

C /iv\'"(i-^-)/2_ C /iV\3/4 
(dS)"' Vk) ~ (55)™ \k) ■ 

Note that 

^ ( ^ ) = Ar3/4 ^ ^-3/4 < CN'/'N'/' = CN. 

fc=2^^^ k=2 

Combining this fact with (4.11), (4.13), (4.14) and (4.15), which hold for 
k>2, we get 

j:p(\Xk{Vk-Wk)\>Z)<^+ ^"^ 



k=l 



2NJ 



((^5)™ ' 



which completes the proof. D 



Now that we have shown that VkXk and Wk{k/NY are close with high 
probabihty, the next step is to calculate the probability that Wk{k/Ny is 
large. The next two lemmas provide upper and lower bounds on the proba- 
bility that Wk{k/NY is large. Recall from (1.3) that 

5(5) = ^r('^Vs-V(i-). 
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Lemma 4.5. There is a constant C so that for < 5 < 1/2 and S < 






^ ' 'ky {i-6)s\ 



T.P[Wk[j^) >'-^)<C + 9{S)il + C6). 

Proof. The C on the left takes care of the term A; = 1. Since the con- 
ditional distribution of Wk given W^ > is Beta(l,fe — 1), we have, for all 
A; > 2 and a G (0,1), 

(4.16) P{Wk >a)=r I {k-l){l- xf'"^ dx = r(l - 0)^="^ 

J a 

Using the facts that (1 — ajxY < e~" if < a < x and 1/(1 — x) < 1 + 2a; if 
< X < 1/2, we have, for k>2, 



(4.17) ~ 



Note that 

N 



k=2 

(4.18) 



V- S{l-S){k/N)^-^ (■, , 25 










Letting y = 5(1 — 6){x/N)^ *" , which means that x = y^'^^ "'M where M - 
N{S{1 - 5))-i/(i-'') and dx = y^/^^-'^'>-^M/{l - r) dy, we have 



-S{l-5)i./N)--r ^^ ^ r g-,yi/(i-.)-l (_^\ 



(4.19) =r' ' ^ '' 



1 — r J 1 — r 
2-r 



. 1 — r 
The same change of variables gives 



\N{S{l-6)r^l^^-'\ 







r-Tjyl-r 



J|_^~,-»„i/(.^.)„)-V/(-o-.(-^),, 
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(4.20) 



2SM 



l-r 



(1 - r)m- Jo 
2 



Jo 



<C. 



(l-r)(l-<5) 
Because (1 - 5)-V(i-^') <l + C6, the claim follows from (4.17)-(4.20). D 

Lemma 4.6. There is a constant C so that for < 6 < 1/2 and S < 

T.P{w,{^^J>^^')>-C + g{S){l-CS-Ce-'/'). 

Proof. Recall from the beginning of the proof of Lemma 3.2 that if 
< X < 1/2, then log(l - x) > -{x + x^). It follows that if < a/y < 1/2 
and a > 0, then ylog(l — a/y) > —a — a'^ /y, and so 

l-^)'>e-e-V.>e-(^l "' 

Therefore, if fc > 2, then 



NJ - N 

5(1 + 5)^^-1 



(4.21) 



r 1 



>r 1 









We have 



TV 
fc=2 



S{l+5){k/N)^-- 



oo 



(4.22) >(y e-^(i+^)(-/^)'"''rfx 

AT 
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It follows from (4.19) with 5 replaced by -5 and M = N{S{1 + 5))"^/^'^^'"'^ 
that 



Jo VI 

(4.23) 

>r('^')iV5-v(i--)(i-c<5). 

To estimate the second term in (4.21) we note that 



^ -S(l+S){k/N) 
k=2 



Ee- 



N2~2r 



Making the change of variables x = y^'^^~''"'M and reasoning as in (4.20), 
we see that this equals 

(4.24) 

For all real numbers b, there is a constant C such that x e~^''^ < C for all 
X > 1. Using this fact and our favorite change of variables, we get 

" e-^(^+'5)(-/^)'-^ dx = r e-^yi/(i-^)-i ( ^^) dy 
N Js{l+5) \l-rj 



(4.25) 



l-r Js 
l-r Js 



The lemma now follows by combining (4.21)-(4.25). D 

Proof of Theorem 1.3. Let Ak be the event that {NXkVk > S}, so 
Fs^N = J2k=i ^Afe- First, note that if the theorem is true for S = ^N^~^ , then 
we know that E[Fs^n] ^ C for all S > ^N^~'^ , which implies the assertion in 
the theorem. Therefore, it suffices to prove the result for S < ^N^~^, in which 
case the conclusions of Lemmas 4.5 and 4.6 will hold as long as we choose 
6 < 1/2. Let A^ = {NWkik/NY > (1 - 6)S} and let A+ = {NWk{k/NY > 
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(1 + 5)S}. Let Fs = Efe=i 1a- and F+ = ELi ^a+- Writing Fs for F^.tv, 
we have 

(4.26) \Fs-giS)\ < {Fg-F^l + \Fg - E[F^]\ + \E[Fs] - giS)\. 

To prove the theorem, we will bound the expectations of the three terms on 
the right-hand side of (4.26). 

Note that A+ C A' for ah k and AkAA' C {A-\A+)U{\VkXk-Wk{k/NY\ > 
5S/N}. By Lemmas 4.5 and 4.6, we have 



N 



(4.27) 



Y: P{Al \ Al) <C + C9{S){5 + e-"^}. 

k=l 



By (4.8) and Lemmas 4.3 and 4.4, we have 

N 



fe=i 



VkXk-Wki^ 



> 



6S_ 

77 



(4.28) 



-Ajl-rx 2/(3-2r) 

<1 + C(^^] +c 



N 



< 1 + CgiS) 



6S J ' '^ \{SS)^/^(^~< 

(iV5'-l/{l-'^))-l/(3-2r) 5.-1/2(1-0 



^3/2(l-r) 



^2/(3-2r) 

Combining the last two results, we get 

(4.29) E[\Fs -Fs\]<C + Cg{S){Di + D2), 

where Di and D2 are the terms in braces in (4.27) and ( 4.28). To bound 
the second term of (4.26), we use Jensen's inequality and the fact that the 
A^ are independent to get 



E[\F, - E[F^]\] < E[{F, - E[F^]f]'/' 



VariFs) 



1/2 



(4.30) 



N 



E Var(l, 



Lfc=l 



1/2 



< 



N 



EnA",[ 



fc=i 



1/2 



< Cg{Sy/^ < Cg{S){NS~^/^'^"'^y^^^. 

Furthermore, note that since Lemma 4.5 gives an upper bound for E[F^] 
that is greater than g{S) and Lemma 4.6 gives a lower bound for -^[-F^^] that 
is smaller than g{S), the difference |£'[F^] — g{S)\ is less than or equal to 
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the difference between these two bounds, which itself was bounded in (4.27). 
Combining this observation with (4.26), (4.29) and (4.30), we see that 

E[\Fs - g{S)\] <C + Cg{S){Di + D2 + {NS-'/^'-'^y'^^). 

To prove the theorem, we need to show that each part of Di + D2 is bounded 
by S~°' or (^NS~^'^^~'^^)~^ for some positive constants a and b. Letting R = 
NS''^'^^~'^' to simphfy notation, it is enough to bound 

^-i/(3-2r) 5.-1/2(1-0 

'^+ (^2/{3~2r) + ^3/2{l-r) " 

To do this, we let 6 = ^(S*"*^ + R^'^), where 3c < 1 and 2d < 1, and choose A 
to ensure that 5 < 1/2. To optimize the bound we set (1 — 3c)/2(l — r) = c 
and (1 - 2d)/{3 - 2r) = d. Solving gives c = 1/(5 - 2r) and d = 1/(5 - 2r). 
D 

5. Sizes of the largest families. In this section we study the largest fam- 
ilies, whose sizes are 0{N^~'^), and we prove Propositions 1.4 and 1.5. The 
key to our arguments is the following well-known result about Yule pro- 
cesses, which is discussed in Chapter III of [7]. Suppose {X{t),t > 0) is a 
Yule process started with one individual at time zero in which each individ- 
ual splits into two at rate A. Then, there exists a random variable W such 
that 

lim e-^*X(t) = W a.s. 

t—tOO 

and W has an exponential distribution with mean 1. A consequence of this 
fact is that if Xi{t), . . . , X^it) are k independent Yule processes, each started 
with one individual at time zero, then 

(5.1) lim — ^-^ rT = B a.s., 

^ ^ t^oo X,{t) + ■ ■ ■ + Xk{t) 

where B has the Beta(l, k — 1) distribution. 

Proof of Proposition 1.4. The k = l case was proved by Angerer [2], 
so we may fix A; > 2. Let I^ denote the kth individual to enter the population. 
Let Dk^N be the number of descendants of Ik in the population at time T/v, 
when the total population size reaches N, and let Gk,N be the number of 
those descendants having the same type as I^- It follows from (5.1) that 

lim —— = Bk a.s., 

TV^oo A* 

where Bk has the Beta(l,/i; — 1) distribution. Also, by the same argument 
as in the k = 1 case, we have 

lim — -"i — = Mu a.s.. 
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where M^ has the Mittag-Leffler distribution with parameter 1 — r. More- 
over, since the descendants of I^ form a Yule process and mutations are 
neutral, M^ and 5^ are independent. 

Recall that Rk^N is the number of type-A; individuals in the population at 
time T/v. On the event that the A;th individual born is a mutant, we have 
Rk N = Gk N- Therefore 



Zf^ = hm ^^ = lim , ""'" ( ^^^^ ] ' = MkB, 

^ AT ,__ ATl-r ^^ , __ fD. ..M-r \ AT *= fe 



^^•^ - lim ^'''^ ( Dk,N \ _n^.T3l-r 

n'^'oo ATI-'- AT-^oo {Dk^Ny-"" \ N 



almost surely on the event that the fcth individual born is a mutant. Propo- 
sition 1.4 follows because M^ and Bj. are independent of the event that the 
kth individual is a mutant. D 

It remains to prove Proposition 1.5. We will need the following lemma. 

Lemma 5.1. Given e > 0, there exists a positive integer L such that for 
sufficiently large N , 

N 

(5.2) J2P(Rk,N>N^~l<e. 

k=L 

Proof. We have RkN = NVkX^, so P{Rk,N > N^~n = PiVkX^ > N-). 
From ( 4.9) with S = iV^"'" and 5 = 1/2, we get 



TV 



k=L 



k 



(5.3) Tp(wJx,-l^ >i:_<_;^(i -!)-(--) 



N^''\ 2C 



which is less than e/3 for sufficiently large L. By Lemma 4.4, again with 
5 = iVi-^ and 6=1/2, 



(5.4) ^p(|Xfc(14-M^fc)|>^) 



7V-^\ e 
k=L -^ ^ / 3 



for sufficiently large A^ as long as L > 2, because the 1 on the right-hand side 
of (4.10) comes from the k = l term. Finally, (4.16) implies that for L>2, 

(5.5) 

which is also at most e/3 for sufficiently large L. The lemma follows from 

(5.3), (5.4) and (5.5). D 

We now review some facts about the Mittag-Leffier distribution. Let X 
be a stable random variable satisfying E[e~ ] = e~ , where < a < 1. 
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Then, it is well known (see [30], Section 0.5) that X is nonnegative and has 
density 

(5.6) Ux) = -'£^^-^sin{^ak) ^^^,^/ , x > 0. 

It follows from [36] that if Ai = ai/(2{i-«)) (cos z^)-i/(2(i-a)) [2^(1 _ 0)]^ ^^ 
and A2 = (1 - a)a"/(i-") (cos 2™ )-i/{i-°) , then 

where "~" means that the ratio of the two sides tends to 1 as x | 0. The 
Mittag-LefSer distribution with parameter a G (0, 1) is the distribution of 
Y = X~°^. Therefore, if ga denotes the density of Y, a change of variables 
gives 

(5.7) g^{x) = i^^^^^ ~ ^^1/(2(1-))-! eM-A2x'/<^'--^ 

where "~" means that the ratio of the two sides tends to 1 as x ^ 00. In 
the following proof, C is a positive constant whose value may change from 
line to line. 

Proof of Proposition 1.5. Let g be the density of M, which has 
the Mittag-Leffler distribution with parameter 1 — r. By (5.7), there exists 
a constant C such that g{x) < Cx^''^^~^e~ '^^ ' for all x > 1. Therefore, if 
X > 1, then, making the substitution y = A2Z^'''' , we get 



f° 
P{M > x) < C / 

Jx 



^l/2r-l^-A,.^/r^^ 



(5.8) =-%r.^y~'^''^'^y 

Fix a positive integer L. It follows from Proposition 1.4 and (5.8) that there 
is a constant C such that 



L 



lim J2 PiRk,N > xN'^^'") = r ^ PiMBl-" > x) 



k=2 fc=2 

L 



fc=2 ^ ^k 

<C7rX:i?[e-^^("/^^'')''"'] 

fc=2 
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L 
/O 



fc=2 

Note that 

oo oo fc— 1 

k=2 



CrY\ f\k - 1)(1 - y)'=-2g-A.(./,-'-)V^ ^y^ 



k=2 i=l 




oo oo oo ^-1 ,,v-l 


1 

72 



Therefore, making the substitution z = A2{x/y^~^)^'^ and using the fact 
that, for all real numbers b, there is a C > such that z^e~^ < Ce~^' ^ for 
all z> A2, we get 



L 

Af^oo 



hm J2PiRk,N>xN^~ 



k=2 

-1 



(5.9) 



< Cr / y-2g-A2(x/yi-')V^ ^y 

Jo 



r^ft^2 /"OO 



(1 - r)A2/^^"''^xV(i-0 Ja2X^/ 
By combining (5.9) with (5.8) for the k = l case, we get 

L 

lim y P{RkN>xN^~'')<}.Cie-^^''^'\ 

N^oo'^ ^ ' ' - 2 ^ 

k=l 

where Ci and C2 are constants that do not depend on L. The proposition 
follows by letting e = 2Cie~'^^^ "^ and choosing L as in Lemma 5.1 such 
that (5.2) holds for sufficiently large A^. D 
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